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FOREWORD 


This  study  was  conducted  in  support  of  Project  2806,  Task  280609, 
over  the  period  May  1964  through  January  1965. 

This  report  presents  a  formal  comparison  between  two  types  of 
classroom  testing  (a)  traditional  multiple  choice  and  (b)  distribution 
of  uncertainty  over  choices. 


This  Technical  Report  has  been  reviewed  and  is  approved. 
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Director 
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ABSTRACT 


This  report  compares  two  types  of  classroom  testing  in  terms  of 
efficacy  in  guiding  instruction.  One  type  of  testing  is  the 
traditional  indirect  method  based  on  the  observation  of  choices. 

The  other  type  is  the  direct  method  based  on  admissible  probability 
measurement.  The  general  finding  is  that  the  direct  methods 
always  perform  as  well  as  and  in  most  cases  better  than  the  in¬ 
direct  methods.  This  deficiency  in  the  indirect  method  can  be 
alleviated  in  theory  by  introducing  redundancy  into  the  test  and 
asking  the  same  question  over  and  over  again.  The  performance 
of  indirect  methods  depends  in  a  very  critical  manner  upon  the 
information  available  to  the  instructor  from  other  sources  about 
the  current  state  of  knowledge  of  each  student.  The  performance 
of  the  direct  methods  is  unaffected  by  this.  The  gain  in  effective¬ 
ness  achieved  by  using  direct  methods  must  be  balanced  off  against 
the  cost  of  using  these  new  methods.  A  direct  method  may  require 
more  student  time  per  item  than  does  an  indirect  method.  This, 
however,  may  be  more  than  compensated  for  by  the  requirement 
for  redundancy  when  using  the  indirect  method.  In  addition,  since 
a  direct  method  does  not  require  additional  information  from  the 
instructor  as  to  the  current  state  of  knowledge  of  each  student, 
the  possibility  exists  that  much  larger  classes  may  be  taught 
with  no  loss  in  effectiveness  thus  implying  even  further  economic 
benefits  from  the  use  of  direct  methods  to  guide  classroom  in¬ 
struction. 
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DIRECT  VS.  INDIRECT  ASSESSMENT  OF  SIMPLE 
KNOWLEDGE  STRUCTURES 

II.  Edward  Massengill  and  Emir  H.  Shuford,  Jr. 

1.  Statement  of  the  Problem « 

In  this  report  we  compare,  mathematically,  two  testing  methods  in 
a  well-defined  situation.  Our  purpose  is  to  determine  how  the  two  methods 
perform  in  the  matter  of  classifying  students  in  this  situation  and  to 
ascertain  some  of  the  distinguishing  characteristics  of  each  method. 

Since  our  results  are  logically  derived  from  explicitly  stated  assumptions, 
we  have  no  doubt  as  to  their  validity  for  the  specific  situation  we  are 
examining.  Further,  if  there  are  real-life  situations  which  are  equivalent 
to  the  one  we  define,  we  can  be  certain  that  our  results  will  apply  to 
these  situations.  But  we  will  not  be  concerned  here  in  seeking  to 
determine  the  extent  of  the  generality  of  the  situation  we  have  chosen. 

This  is  not  crucial  for  our  purpose.  This  should  not  be  taken  to  mean 
that  we  are  not  concerned  with  how  these  results  may  relate  to  more 
complex  situations.  On  the  contrary,  we  hope  that  the  findings  for 
this  situation  will  give  us  a  better  idea  of  what  to  look  for  in  more 
complex  situations.  And  we  are  confident  that  the  approach  we  have  used, 
namely  the  application  of  purposive  mathematics  (Massengill,  1964),  can 
be  extended  to  aid  us  in  the  analysis  of  these  more  complex  situations. 

The  two  methods  which  we  will  compare  are  the  traditional  indirect 
method,  IM*,  and  the  direct  method,  DM.  In  the  indirect  method,  the 

* 

We  intend  to  deal  with  the  indirect  method  in  terras  of  decision  theory 
so  that  all  of  the  information  available  to  the  person  using  this  method 
may  be  explicitly  taken  into  account. 


student  is  given  a  question  with  two  or  more  alternatives  and  asked  to 
choose  the  correct  alternative.  In  the  direct  method,  the  student  is 
also  given  a  question  with  two  or  more  alternatives.  But  instead  of  being 
asked  to  give  the  correct  answer,  he  interacts  with  a  measurement  procedure 
which  outputs  an  inferred  subjective  probability  distribution  over  the 
alternatives  .* 

In  order  for  the  results  of  our  comparison  to  be  meaningful,  we  must 
know  exactly  what  assumptions  are  involved  both  in  the  student!s  response 
process  and  in  the  two  testing  methods.  To  keep  the  assumptions  simple, 
and  thereby  make  the  arguments  easier  to  follow,  we  will  use  a  very  simple, 
but  not  unrealistic,  testing  situation.  The  test  will  consist  of  one  two- 
alternative  question,  or  the  same  two-alternative  question  repeated  several 
times.  The  purpose  of  the  test  will  be  to  help  determine  if  a  student 
knows  a  particular  concept. 

The  concept  which  we  will  test  deals  with  the  classification  of  two 
objects,  B  and  C9  according  to  whether  A-B  or  A=C.  A  given  student  has  been 
through  a  lesson  in  which  the  instructor  has  taught  that  A-B  and  that  it  is 
not  the  case  that  A-C .  If  the  student  has  learned  A-By  we  shall  say  that  he 
is  trained,  T,  If  he  has  learned  that  A-Cy  we  shall  say  that  he  is  mis- 
trained,  mT.  If  he  has  learned  nothing,  we  shall  say  that  he  is  untrained, 
uT. 

At  the  close  of  the  lesson,  we  want  to  classify  the  student  as  belonging 
to  one  of  the  three  categories:  T ,  uTj  or  mT.  The  student!s  next  lesson 

* 

See  Shuford  (1965),  Shuford,  Albert,  &  Massengill  (1965),  and  Shuford 
&  Massengill  (1965)  for  more  information  about  these  measurement  procedures. 
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will  depend  on  now  ne  is  classified  at  tae  end  of  this  lesson.  Thus,  if 
he  is  classified  as  trained,  he  will  go  on  to  the  next  lesson.  If  he  is 
classified  as  untrained,  the  sane  lesson  will  be  repeated.  And  if  he  is 
classified  as  iris  trained,  he  will  be  given  the  same  lesson  in  a  different 
form.  Because  of  the  effect  of  the  classification  on  the  next  step  of  the 
student’s  training,  we  will  use  a  payoff  scheme  for  which  a  correct  class¬ 
ification  is  more  valuable  than  an  incorrect  one  and  for  which  the  values 
of  correct  classifications  are  equal  and  the  values  of  incorrect  classifications 
are  equal.  For  the  derivations  of  this  report,  a  correct  classification  will 
have  a  value  of  1.0  and  an  incorrect  one  a  value  of  0 .  Table  1  shows  the 
payoff  matrix  for  this  decision  problem. 


TABLE  1 

Payoff  Matrix  for  tne  Instructor  who  is  Classifying 
Students  According  to  Three  Categories 


ACTS 

CATEGORY 

EU(  cli ) 

T 

uT  , 

rv.T 

a.:  T 

1.0 

0 

0 

?(?) 

V  uT 

0 

1.0 

0 

P(uT) 

a  :  mT 

0 

0 

1.0 

P(rrC) 

It  should  be  noted  that  the  particular  utility  structure  we  are  using 
makes  the  expected  utility  of  an  act  equivalent  to  the  expected  proportion 
of  correct  classifications.  Thus,  each  statement  which  we  make  about 
expected  utility  can  also  be  interpreted  in  terms  of  expected  proDortion 
of  correct  classifications.  For  the  most  part,  we  will  use  the  term 

expected  utility  since  it  is  more  general. 
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2 .  C las si f i cation  W i tn ou t  Tes  ting . 


It  is  possible  for  the  instructor  to  classify  students  with  some 
accuracy  even  without  giving  them  a  formal  test.  After  all,  he  may  have 
observed  the  students  during  the  lesson  and  may  have  some  rather  strong 
feelings,  at  least  for  some  of  the  students,  about  which  category  a 
student  is  in.  Suppose,  for  example,  that  a  student  spent  the  whole 
period  of  the  lesson  working  on  some  other  assignment.  Then  the  instructor 
might  have  reason  to  believe  that  the  student  is  untrained.  Suppose  that 
another  student  dozed  during  the  lesson  and  only  came  to  life  during  the 
time  the  instructor  was  discussing  A=Cm  But  suppose  that  he  did  not 
remain  awake  long  enough  to  hear  the  instructor  make  the  point  that  A=C 
is  not  the  case.  If  the  instructor  noticed  this,  then  he  might  be 
pretty  sure  that  the  student  should  be  classified  as  mis  trained.  Finally, 
suppose  that  a  third  student  had  listened  intently  to  the  instructor’s 
words  during  the  lesson.  Then  the  instructor  might  be  fairly  sure  that 
this  student  was  trained,  since  this  concept  should  be  easy  to  learn  if 
one  hears  it  explained.  Thus,  by  observing  the  students  during  the  lesson, 
the  instructor  might  be  able  to  do  a  fairly  good  job  of  classification 
without  testing  the  students. 

If  an  instructor  can  evaluate  his  subjective  probabilities  concerning 
which  category  a  student  is  in  and  express  them,  then  his  expected  utility 
can  be  determined  for  each  combination  of  prior  probability  values. 

Figure  1  shows  the  surface  of  all  the  possible  combinations  of  the 
instructor’s  prior  probability  values  for  the  three  categories.  The 
surface  is  divided  into  three  sections  each  of  which  is  characterized  by 


k 


Figure  1.  Surface  showing  all  of  the  possible  triplets  [P(D ,  P(uT) ,  P{mT) 1 .  The 
value  of  any  member  of  a  triplet  is  _>  0  and  P(T)  *  P(uT)  f  P(rnT)  =  1.0 . 
The  surface  is  divided  into  three  areas.  In  each  area  a  different  one  of 
the  probabilities  is  a  m?vttnun  for  all  points  within  that  area.  P(mT)  is 
constant  along  a  given  slant  line. 


the  fact  that  one  of  the  three  categories  lias  the  maximum  proa  ability 
for  each  point  in  that  section.  From  TaLle  1,  we  see  that  the  expected 
utility  of  a  given  act  is  equal  to  one  of  these  three  probabilities.  The 
act  which  specifies  the  choice  of  tne  most  probable  category  is  the  one 
which  maximizes  expected  utility  and  the  probability  of  the  most  probable 
category  is  equivalent  to  the  maximum  expected  utility.  Using  this  inform¬ 
ation,  v/e  can  determine  the  expected  utility  associated  with  each  possible 
prior  probability  combination. 

Figure  2  represents  this  information  in  two  dimensions.  It  is 
helpful  to  conceptualize  the  information  contained  in  Figure  2  in  terms 
of  three  dimensions.  'Hie  base  of  the  three-dimensional  figure  would  be 
identical  to  Figure  1.  The  third  axis  of  the  figure  would  contain  ElJ(a*), 
which  is  shown  by  the  dashed  lines  in  Figure  2. 

bote  tnat  the  minimum  expected  utility  is  1/3  and  that  this  occurs 
only  ior  the  probability  combination  (1/3*  1/3,  1/3),  i.e.,  P(T)-P(uT)=P(rnT)=l/3 . 
The  maximum  expected  utility,  EU( a *)  =  1.0,  occurs  at  three  points,  the 
tnree  corners  of  the  base  of  the  figure.  The  values  for  the  other  points 
on  the  surface  grow  larger  as  they  increase  in  distance  from  the  point 
(1/3,  1/3,  1/3). 

If  the  instructor  wants  to  decide  whether  or  not  to  test  and  which 
kind  of  test  to  use,  obviously,  he  will  need  to  knew  how  the  expected 
utility  of  testing  for  each  of  the  two  methods  compares  with  the  expected 
utility  of  classifying  without  testing.  Thus,  we  will  need  to  obtain,  for 
the  two  test  methods,  the  information  analogous  to  that  which  we  obtained 
and  summarized  in  Figure  2  for  tne  decision  to  classify  without  testing. 
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Figure  2. 


which  the  lines  pass.  For  example,  EU(aA)  for  the  point  .1)  is  .5. 
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3. 


The  Response  Model  for  tae  Two  Methods  of  Testing. 

To  begin  with  we  will  examine  the  model  which  generates  the  responses 
for  the  situation  we  are  considering.  In  order  to  make  our  assumptions 
concerning  the  model  explicit,  we  will  describe  it  in  terms  of  a  task 
being  performed  by  a  machine.  In  applying  our  results  to  any  real-life 
situation,  the  main  concern  will  be  to  determine  if  the  model  is  a  good 
description  of  the  behavior  of  the  students  in  question.  It  might  be 
helpful  to  point  out  once  more  that  our  main  purpose  is  not  to  determine 
which,  if  any,  actual  situations  the  model  describes  but  rather  to  study 
the  performance  of  the  two  test  methods  for  the  model  in  an  attempt  to 
make  statements  about  the  value  of  each  method  and  to  get  some  idea  of  the 
important  characteristics  of  each  method. 

Figure  3  snows  the  response  model  in  the  form  of  a  computer  flow 
chart.  We  will  examine  the  logic  of  the  flow  chart  step-by-step  in  order 
to  make  the  assumptions  in  our  response  model  as  clear  as  possible.  First, 
let  us  examine  the  location  FACT.  During  the  training  period,  one  of  three 
things  happens  to  FACT.  It  takes  the  value  A=By  corresponding  to  the 
category,  T;  it  takes  the  value  A=Cy  corresponding  to  the  category,  mT; 
or  it  remains  empty,  corresponding  to  the  category,  uT . 

When  the  test  is  given,  there  are  two  pieces  of  information  given  as 
input  to  the  machine:  the  type  of  testing  method,  indirect  or  direct,  and 
the  two-alternative  question  on  the  concept.  The  first  step  for  the  machine 
is  to  compare  the  first  alternative,  A^y  with  FACT.  If  A and  FACT  are 
identical,  then  alternative  1  is  given  a  probability  of  one,  i.e.  P(A ^)=1.0y 
and  alternative  2  is  given  a  probability  of  zero.  If  A  is  the  opposite  of 


B 


PUT: 


*FACT  contains: 

A=B  if  trained 
A**C  if  mistrained 
Empty  if  untrained. 


figure  3.  Flow  chart  of  the  model  producing  the  responses  for  the  two  test  methods. 
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FACT,  then  the  probabilities  of  the  two  alternatives  are  the  reverse  of 
the  case  above.  If  FACT  is  empty,  then  each  alternative  is  given  a 
probability  of  .5. 

Table  2  shows  the  two  possible  forms  of  our  question,  i.e.,  one  with  A=3 


TALLE  2 

P(A^)  Associated  with  Alternative  where  i=l  or  2j  for  a  Given 
State  of  Knowledge  and  a  Given  Form  of  the  Question 


CATEGORIES 

TRAIT]  CD 

UNTRAINED 

MISTRAIKED 

F 

Ar 

A=B 

1.0 

.5 

0 

1 

A?: 

A=C 

0 

.  5 

1.0 

F 

Ar 

A=C 

0 

.5 

1.0 

2 

An: 

Si 

A=B 

1.0 

.5 

G 

as  the  first  alternative  and  the  other  with  A=C  as  the  first  alternative. 
Of  course,  the  question  could  also  be  asked  in  true-false  form.  The 
table  also  shows  the  three  possible  states  anc!  the  results  that  our  model 
would  yield  for  P(A-)  for  a  given  form  of  the  question. 

It  is  immediately  evident  from  Table  2  that  if  we  could  obtain  P(A-) 
from  a  given  student,  we  could  correctly  classify  him  with  one  question, 
for  this  situation.  For  example,  if  we  gave  a  student  the  first  form  of 
question,  Fj  and  found  that  P(A^)=1.0y  we  would  know  that  he  was  trained. 
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If  we  found  tliat  P(/l  J  =  .53  we  would  know  that  he  was  untrained.  And  if 
we  found  that  p(Aj)=0,  we  would  know  that  he  was  mistrained.  Getting 
P(A^)  is  exactly  the  purpose  of  DM.  Thus,  for  this  situation,  DM  would 
give  us  perfect  classification  with  a  one-item  test. 

After  comparing  /\j  with  FACT,  the  machine  then  determines  which 
testing  method  is  being  used  and  acts  accordingly.  If  the  DM  is  being 
used,  the  iracnine  goes  to  a  probability  measurement  procedure  and  it  is 
this  measurement  procedure  which  actually  outputs  the  probabilities  for 
the  two  alternatives,  after  having  inferred  them  from  the  results  of  an 
interrogation  of  the  machine* 

If  IM  is  being  used,  then  the  machine  chooses  one  of  the  alternatives 
as  being  correct.  The  machine  makes  its  choice  in  terms  of  P(Aj)  •  This  is 
not  an  arbitrary  procedure  but  is  based  on  the  machine's  payoff  matrix. 

The  payoff  matrix  is  shown  in  Table  3.  The  utility  structure  of  the 
payoff  matrix  is  nall-or-noneM ,  i.e.,  a  correct  outcome  is  more  valuable 
than  an  incorrect  one,  the  correct  outcomes  all  have  equal  values,  and 
the  incorrect  outcomes  all  have  equal  values.  If  we  represent  the  value 
of  a  correct  outcome  with  1.0  and  the  value  of  an  incorrect  outcome  with  0% 


TABLE  3 

The  Subject's  Payoff  Matrix  for  the  Indirect  Method  of  Testing 


ACTS 

CATEGORIES 

EU( a.) 

t 

*1 

A2 

a2:  A1 

1.0 

0 

P(Aj) 

a2:  APl 

0 

1.0 

P(AJ 

6 
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then  the  expected  utility  for  a  given  question  is  equivalent  to  the 
probability  of  the  most  probable  alternative.  Thus  the  optimal  strategy 
is  to  choose  the  most  probable  alternative  as  the  correct  alternative. 

If  both  are  equally  probable,  then  an  answer  can  be  obtained  by  dioosing 
each  alternative  with  a  probability  of  .5.  Thus,  the  branching  procedure 
in  Figure  3  for  the  choice  method  is  firmly  based  on  decision  tneory 
(Shuford  &  Massengill,  1965). 

Table  4  shews  the  alternative  which  will  be  chosen  as  the  correct 
alternative  for  a  given  form  of  the  question  and  a  given  category  of 

TABLE  4 

The  Alternative  which  will  be  Chosen  as  the  Correct  Alternative 
for  a  Given  Form  of  the  Question  and  a  Given  Category  of  Knowledge. 


FORMS 

CATLGCRY 

TRAINED 

UNTRAINED 

MISTRAINED 

F, 

A^: 

A=B 

X 

.  5 

1 

V 

A=C 

.5 

X 

Fo 

Ay 

A—C 

.5 

X 

2 

a2: 

A=B 

X 

.  2 

knowledge.  If  we  compare  Tables  2  and  4,  we  will  see  that  whereas  DM 
gives  us  unambiguous  information  about  a  subject’s  state  of  knowledge 
with  one  question,  IM  does  not.  For  instance,  if  Form  1  of  the  question 
is  asked  and  the  student  responds  with  Aj  as  the  correct  answer,  then  we 
know  that  he  is  not  mistrained  but  we  do  not  know,  for  certain,  whether 
he  is  trained  or  untrained.  If  he  responds  with  Aoy  we  know  that  he  is 
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not  trained,  but  we  do  not  know,  for  certain,  whether  he  is  untrained 
or  mistrained. 

On  the  basis  of  this  informal  analysis,  we  can  draw  some  conclusions 
about  the  two  types  of  testing  and  the  question  of  whether  or  not  to  test 
in  this  situation.  At  this  point,  we  can  say  that  DM  guarantees  correct 
classification  of  all  students  on  the  basis  of  one  question,  for  our 
situation.  If  no  test  is  used,  correct  classification  of  all  students  is 
only  guaranteed  at  three  points,  i.e.,  where  a  given  one  of  the  categories 
has  a  probability  of  1.0  of  occurring.  The  use  of  IM  improves  on  this 
somewhat  by  always  giving  correct  classification  not  only  when  a  given 
category  has  a  probability  of  1.0,  but  also  when  T  and  mT  together  have 
a  probability  of  one.  Thus,  DM  is  better  than  either  IM  or  classifying 
without  testing  (COT),  for  most  conditions,  whereas  the  latter  two  are 
never  better  than  DM.  Also,  the  goodness  of  IM  and  COT  depends  on  the 
values  of  instructor’s  prior  probabilities,  whereas  DM  only  depends  on 
the  data  observed.  This  gives  us  a  start  toward  our  purpose  of  evaluating 
the  two  testing  methods  for  the  situation  we  have  specified.  But  now  we 
would  like  to  know  how  much  better,  if  any,  DM  is  than  IM  for  each  possible 
condition.  This  will  enable  us  to  be  more  specific  in  our  comparison  of 
the  two  methods.  In  order  to  answer  the  question  of  how  much  better,  we 
will  need  to  get  information  for  IM  and  DM  which  is  analogous  to  the 
information  for  CWT  summarized  in  Figure  2.  This  will  require  a  formal 
analysis  of  the  two  methods  in  terms  of  the  situation  we.  defined. 
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A. 


Formal  Analysis  of  the  Two  Testing  Methods . 

In  this  formal  analysis,  we  will  only  show  the  results  of  our 
derivations  with  comments  on  these  results.  To  enable  the  reader  who 
is  interested  to  get  some  idea  of  the  steps  involved  in  the  derivations, 
we  include  in  Table  5  a  summary  of  the  basic  decision-theoretic  relation¬ 
ships  used  here.* 

TABLE  5 

Summary  of  the  Basic  Decision-Theoretic  Relationships  to  be  used 
in  the  Derivations  of  this  Report 

Payoff  matrix: 


ACTS 

CATEGORIES 

si 

S 

2 

s . 

j 

s 

m 

ai 

un 

11  •  •  ♦ 

12 

u  .  •  •  • 

10 

u 

lm 

a2 

U21 

1/L  ... 

4/1  o  o 

Ac, 

.  ... 

20 

u 

2m 

• 

♦ 

;  • 

:  : 

: 

H 

Hi 

ui2 

tm 

an 

Hi 

H?. 

Hij 

Him 

Prior  probabilities:  probability  of  the  category  S 

<7 

P(S.);  where  ZP(S  J-1.0 
J  J  3 

Conditional  probabilities,  probability  of  data  given  category  S., 

P(dk\Sj);  where  ^Pfd^S  J=l.  0 

Unconditional  probabilities:  probability  of  data 

P(dv)  =  ms  .)P(d\S  ,);  where  lP(d  )=1.0 
k  0  <7  k  3  k  k 


*  See  Raiffa  and  Schlaifer  (1961)  for  the  mathematical  background  leading 
to  these  relationships. 


lh 


TABLE  5  (Cont.) 


Posterior  p rob abili ties: 


where  %P(S  I  d  )~1 .0 . 

3  o  k 


Expect ed  utility  of  act  a„;  given  data  d^: 


EU(a^\ dp)  =  l?(S  I d  )u 


P(S  Ad ,) ;  for  tiie  utility  structure  of  Table  L.l. 

1 1  K 


Maximum  expected  utility  given  data  dh 

K 


cll( a"* |  dh)  =  max  EU( a .  |  dL) 
K,  'l'  rC 


Average  expected  utility  of  responding  with  the  optimal  act  for  each  data  result 


EU(a V  =  %P(d  )EU(a*\dJ 
k  k  K 

=  T,P(S  )P(d  \S  )y  for  the  utility  structure  of  Table  E .  1 , 

j  J  k  3 

-  Expected  proportion  of  correct  classifications . 

4.1  Direct  Method. 

At  tnis  point,  we  want  to  show  formally  that  the  average  expected 
utility  yielded  by  DM  is  equal  to  1.0  for  all  conditions  in  the  situation 
we  are  using.  In  other  words,  we  want  to  show  that  DM  will  always  lead  to 
the  correct  classification  of  students  in  this  situation.  And  in  the 
process  we  will  show  that  the  instructor’s  prior  probabilities  arc 
irre  levant . 

Prior  probabilities.  There  are  three  prior  probabilities:  P(T)j 
P(uT)>  and  P(rnT)=l.  O-P(T) -F(uT)  y  i.e.,  the  probability  that  a  student 
is  trained,  the  probability  that  he  is  untrained,  and  the  probability  that 
he  is  restrained.  The  three  probabilities  sum  to  1.0 . 
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Data.  For  a  given  question,  we  can  get  one  of  three  data  results 


from  the  student.  These  are  the  probabilities  1.0,  .5,  and  0.  As  is 
evident  from  Table  2,  the  meaning  of  the  data  will  depend  on  the  form 
of  the  question  used.  Our  derivations  will  be  in  terms  of  the  first 
form.  The  results  will  be  analogous  for  the  second.  Thus,  the  possible 
data  results  are: 

d2:  P(A  )=1.0, 
dp:  P(A  )=.S , 
dz:  P(A2)=0. 

Conditional  probabilities.  We  will  now  state  formally  the  relevant 
conditional  probabilities. 

P(d2\T)  =  1.0. 

P(d2\uT)  =1.0. 

P(d  \mT)  =  1.0. 

tJ 

Actually,  we  can  talk  about  the  probability  of  each  data  result  given 
each  category,  but  there  is  no  need  to  do  so  in  this  case,  since  for 
a  given  category,  one  data  result  has  all  of  the  probability.  Thus, 
if  the  student  is  trained,  only  d ^  can  occur;  if  he  is  untrained,  only 
^2  can  occur;  and  if  he  is  mistrained,  only  can  occur. 

Unconditional  probability  of  the  data.  In  this  case,  the  probability 
of  a  given  data  result  occuring  is  equal  to  the  prior  probability  of  the 
category  which  can  yield  that  result.  Thus, 

P(d2)  =  P(T)P(d2\T)  +  P(uT)P(d2\uT)  +  P(mT)P(d  \mT) 
=  P(T)} 

P(dp)  =  P(uT ), 

P(d3)  =  P(mT). 
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Thus,  for  example,  the  probability  that  dj  will  occur  is  equal  to  the 
probability  that  the  student  is  trained,  P(T) . 

Posterior  probabi lities  .  Now  let  us  see  how  a  particular  data 
result  affects  the  prior  probabilities. 

P(T\d2)  =  P(T)P(d2 1  T)/P(d2)  =  1.0 

P(uT\d2)  =  1.0. 

P(mT\d3)  =1.0. 

It  is  clear  that  a  particular  category  is  certain  to  occur  once  a  given 
data  result  has  been  observed  and  that  a  different  category  is  certain 
to  occur  for  each  data  result.  Thus ,  a  data  result  implies,  unambiguous  ly , 
the  state  of  knowledge  of  the  student.  As  we  can  see  from  the  equations, 
the  prior  probabilities  have  absolutely  no  effect  on  the  posterior  prob¬ 
abilities.  One  implication  of  this  is  that  an  instructor  need  have  no 
prior  knowledge  of  the  student  taking  the  test  in  order  to  classify  him 
correctly  in  this  situation. 

Expected  utility .  Since  for  this  situation  the  maximum  expected 
utility  for  a  given  data  result  is  equivalent  to  the  posterior  probability 
of  the  most  probable  category,  only  one  expected  utility  is  different  from 
0  for  a  given  data  result  and  that  one  is  equal  to  2.(9.  Thus,  the  maximum 
expected  utilities  are: 

EU(a*\d2)  =  1.0  y 

EU(a2\dJ  =  1.0  y 
EU(a3\d3)  =  1.0. 


IT 


Tnis  means  that  the  optimal  acts  are:  aj,  if  dj  is  observed;  apy  if  d0 
is  observed;  and  a^j  if  d 3  is  observed.  And  from  the  equations  we  see 
that  the  expected  utility  of  the  optimal  act  for  each  possible  data 
result  is  1.0. 

Average  expected  utility.  Since  an  expected  utility  of  1.0  is 
guaranteed  regardless  of  the  data  result  obtained,  the  instructor  is 
guaranteed  an  average  expected  utility  of  1.0  >  i.e., 

EU(a*)  «  1.0. 

This  is  true  regardless  of  the  values  of  the  prior  probabilities.  Thus, 
over  the  whole  surface  shewn  in  Figure  2,  the  expected  utility  for  a 
one-item  test  is  1.0.  This  is  an  improvement  over  the  approach  of  class¬ 
ifying  without  testing  except  at  the  three  comers  of  the  triangle.  Of 
course,  whether  or  not  one  should  test  with  DM  or  classify  without  testing 
depends  on  the  values  of  the  prior  probabilities  and  the  cost  of  testing. 
4.2  Indirect  Method » 

We  have  seen  that  only  one  question  is  required  for  DM  in  order  that 
each  student  be  correctly  classified,  for  the  situation  under  discussion. 
We  have  also  seen,  informally,  that  there  are  situations  in  which  IM  does 
not  allow  for  perfect  classification,  at  least  with  one  question.  Thus, 
in  the  very  beginning,  we  will  include  the  idea  of  repeating  the  same 
question  a  number  of  times  to  see  if  this  might  improve  the  performance 
of  one  who  uses  IM.  In  deriving  the  results  for  IM,  we  have  assumed  that 
the  student's  answer  to  a  question  does  not  influence  his  answer  to  the 
same  question  when  it  is  asked  again.  In  other  words,  the  machine  always 
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Dehaves  as  if  a  question  has  not  been  previously  encountered.  (We  will 
examine  the  implications  of  this  assumption  for  IM  in  Section  5.1.  For 
DM,  of  course,  we  do  not  have  to  worry  about  repeating  items,  at  least 
in  this  situation,  since  one  item  is  sufficient  for  perfect  classification.) 

prior  prob abi  li ties .  These  are  the  same  as  for  DM. 

Data.  If  a  question  is  asked  once,  there  are  two  possible  pieces  of 
information:  the  student  is  correct,  Cy  or  he  is  not  correct,  ~C.  From 

Table  4,  we  can  see  that  if  a  student  is  trained,  the  data  result  will 
always  be  C*  if  he  is  mistrained,  it  will  always  be  But  if  he  is 

untrained,  it  may  be  either.  (It  is  this  last  possibility  which  brings 
ambiguity  into  the  situation.) 

If  we  ask  the  question  n  times,  we  will  get  r  C s  and  n-r  ~Cfs.  For 
the  trained  student,  we  can  only  get  the  result  r*=ny  i.e. ,  n  correct 
answers  out  of  the  n  times  the  question  is  asked.  We  will  denote  this 
result  as  crr  For  the  mistrained  student,  we  can  only  get  the  data  result 
r*=0y  i.e.,  no  correct  responses  out  of  n  questions.  We  will  denote  this 
result  as  CQ .  For  the  untrained  student,  we  can  get  either  of  these  two 
results  and  also  the  result  Cv*y  where  v  may  equal  any  integer  between 

1  and  n-3.  Thus,  we  are  interested  in  three  data  results  for  CM:  C  . 

n* 

CQy  and  Cv *. 

Conditional  probabilities  of  the  data.  The  following  are  the  relevant 
conditipnal  probabilities  for  any  one  trial: 

P(C\T)  =  1.0 y 

P(C\uT)  =  .5,  P(~C\uT)  =  .5, 

P(~C\mT)  =  1.0. 
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Since  the  trials  are  independent,  we  can  obtain  the  conditional  probabilities 
for  a  given  category  of  knowledge,  S3  by  use  of  the  binonial  probability 
equation.  Thus, 

p(cr\s)  =  (p  [P(c\s)]  *  [pr-c|5;]  n~r . 

As  we  have  seen,  only  one  data  result,  Cn  has  any  probability  for  the 
trained  student: 

P(Cn\T)  =  1.0. 

Likewise,  only  one  data  result,  Co  has  any  probability  for  the  mistrained 
s  tudent? 

P(C0\mT)  =  1.0. 

For  the  untrained  student,  however,  each  of  the  three  data  results  has 

some  probability  for  a  finite  nl 

P(Cn \uT)  =  — — 
n  2n 

P(C0\uT)  =  JL_ 

2n 

P(Cr*\uT)  =  1  -  [P(Cn\u?)  +  P(CQ \uT)] 

_  2K-1-1 

We  can  see  immediately  that  as  n  approaches  infinity,  the  last  probability 

approaches  1.0  and  the  total  situation  tends  to  the  one  we  had  for  DM,  i.e., 

a  particular  data  result  implies  a  particular  classification. 

Unconditional  probabilities .  Now  we  want  to  look  at  the  probability 

that  a  given  data  result  will  occur. 

P(CJ  =  P(T)  +  _J_  P(uT) 
n  2n 

P(T)  as  n-*-°° 
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P(uT) 


*%*> 

P(C0) 


2n-l 

-*■  P( uT)  as  n  -*•  °°  . 


=  J~gn  P(uT)  + 

2n 

->  P(mT)  as  n  ->•  00 


2.0 


prr; 


here,  again,  as  n  approaches  infinity,  the  values  of  these  probabilities 
approach  the  same  values  they  had  for  DM. 

It  should  be  noted  that  for  n=l , 


P(cr*)  =  0, 

since  for  n=l  r  must  equal  0  or  I.  In  other  words  only  two  of  the  data 
results  are  possible  when  n=l . 

Posterior  probabilities.  Now  we  want  to  see  how  a  particular  data 

result  affects  the  probabilities  of  the  categories.  First,  let  us  see 

what  happens  when  Cn  is  observed. 

C  )  =  p(T) _ 

n  P(T)  +  J_  P(uT) 

-y  1.0  as  n  -*•  00 

Sir  F<“T> 

P(T)  +  -1  P(uT) 

2n  - 
0  as  n  00 

P(mT\c  )  =  0. 

n 


Thus,  as  we  have  already  observed,  only  T  and  uT  have  any  probability  when 
Cn  is  observed.  And  as  n  approaches  infinity,  only  T  has  any. 

When  C0  is  observed. 
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0 


P(T I C  ) 

'  o 

P(uT\ Cq) 


h  

l-2n 

—^n~P(uT)  +  1.0  -  P(T) 


-+  0  as  Yl  +  oo  j 


P(mT\C0)  =  P(UT) - 

—JPT  p(uT>  +  1-0  -  P(T) 

-+  1.0  as  n  co 

Only  uT  and  rrd  have  any  probability  when  CQ  is  observed  and,  as  n 
approaches  infinity,  only  rnT  has  any. 

When  Cv*  is  observed. 


P(uT\Cr*)  -1.0. 


Thus,  as  n  approaches  infinity, the  posterior  probabilities  take  on  the 
same  values  as  they  did  for  DM. 

Optimal  acts .  For  the  utility  structure  of  Table  1,  the  optimal 
act  for  a  given  data  result  depends  on  which  of  the  posterior  probabilities 
is  largest.  For  the  outcome  Cn>  the  optimal  strategy  is  to  choose  cij  when 

P(T\cn)  >  P(uT\Cn), 

i.e. ,  when 

P(tiT)  <  2nP(T)  =  Y, 


and  to  choose  when  the  inequality  is  reversed. 

For  the  outcome  co>  the  optimal  strategy  is  to  choose  <Zo  when 

P(uT\C0)  >  P(mT\C0), 

i.e.,  when 


P(ul)  > 


2n 


2n 


P(T)  =  2, 


2n+  1  2n+l 

and  to  choose  a ^  when  the  inequality  is  reversed. 

For  the  outcome  Cr A  the  ..optimal  strategy  is  always  to  choose  <23. 
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Thus  we  see  that  the  optimal  act  given  Cn  or  CQ  depends  not  only 
on  the  data  observed  but  also  on  the  relationship  between  P(uT)  and 
P(T).  There  are  four  different  possible  relationships  between  P(uT)  and 
P(Tl.  These  are  shown  as  the  row  headings  of  Table  6.  For  a  given  one  of 
these  relationships,  a  given  data  result  determines  the  optimal  act.  For 
example,  when  Z<P(uT) <Y ,  aj  is  the  optimal  act  when  Cn  occurs.  We  can  use 
these  relationships  to  divide  (die  area  shown  in  Figure  1  into  four  sections 
Sjj,  Spj  and  5^.  Each  of  the  four  sections  is  characterized  by  one  of 

the  rows  in  Table  6,  i.e.,  by  a  certain  pattern  for  the  optimal  acts  given 
the  data. 

Figure  A  shows  the  surface  of  possible  prior  probabilities  divided 
into  four  sections  as  a  function  of  the  relationship  between  P(uT)  and 
P(T)y  for  n=l.  Each  of  the  sections  corresponds  to  one  of  the  rows  in 
Table  6.  For  example,  the  area  labelled  5^  corresponds  to  the  second  row 
in  the  table.  The  line  in  the  figure  labelled  Cn  represents  the  dividing 
line  through  the  surface  for  the  two  possible  optimal  acts  whenC^  is 


TAbLE  6 


SECTION 

DATA 

Cn 

Co 

S2:z<P(uT)  <  Y 

al 

a2 

a2 

S£:  P( uT)  <  YyZ 

al 

a2 

aZ 

S3:  P(ut)  >  YyZ 

a2 

a2 

do 

Cl 

S  : Z >P(uT)  >  Y 

a2 

a2 

az 

Y  =  2nP(T);  Z  =  -  —ill  P(T) 

2n+l  2n+  1 
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P(uT) 


Figure  4.  The  division,  for  n= of  the  surface  of  possible  prior  probability 
combinations  according  to  the  pattern  of  optimal  acts  for  the 
possible  sample  results.  For  any  section,  the  acts  listed  are 
optimal  given  and  CQ9  respectively. 


observed.  For  the  points  above  that  line 


i  .e. , 


P(uT)  >  ?n  P(T )y 


P(?\cn)  <  P(uT\Cn), 

and,  thus,  a2  should  be  chosen.  The  inequality  is  reversed  for  the  points 
below  that  line  and  so  dj  should  be  chosen.  Similarly,  the  line  labelled 
Cq  represents  the  dividing  line  for  the  case  when  C0  is  observed.  As  we 
have  seen,  when  Cva  is  observed,  <2^  is  always  chosen  regardless  of  the 
relationship  between  P(T)  and  P(uT ).  Remember  that  these  results  are  for 
n-1.  We  can  divide  the  surface  in  analogous  fashion  for  each  possible 
value  of  n.  Figure  5  shows  the  divisions  ranging  from  n=l  to  n=4.  Notice 
that  as  n  gets  larger,  the  section  of  the  surface  for  which  it  is 

optimal  to  take  cl^>  aoy  a ^  respectively,  for  the  data  results  Cn>  CQJ 

gets  larger.  And  notice  that  as  n  approaches  infinity,  the  dashed  line 
approaches  the  P(uT)  axis  and  the  solid  line  approaches  the  right  hand 
boundary  of  the  surface,  i.e.,  there  is  only  one  optimal  strategy  for 
each  data  result  regardless  of  the  prior  probabilities.  Thus  as  CM 

gives  results  for  all  conditions  which  are  equivalent  to  those  given  by 
PM  for  one  item. 

Average  expected  utility .  We  can  find  the  average  expected  utility  for 
any  point  in  any  of  the  sections.  To  do  this,  we  simply  weight  the  expected 
utility  of  the  optimal  act  given  the  data  by  the  probability  of  the  data, 
for  each  data  result,  and  then  sum  over  these  weighted  results.  This  gives 
us  the  average  expected  utility,  given  P(T)  and  P(uT )y  of  choosing  the 
optimal  act  for  each  data  result.  Thus,  for  the  four  sections  of  our  surface, 
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P(uT) 


\o 


\. 


\ 


Figure  5.  The  changes  in  the  four  divisions  of  the  probability  combination 
surface  as  a  function  of  n. 


EU(a*\S-,)  =  J--1—  P(uT)  +  P(T) 

i  2n 

P(uT)  +  P(T)  as  yi  00  » 

EU(a*\SP)  =  -  _J_„  P(uT)  +  1.0 

2n-l 

■+  1  .  0  as  n  -*•  00  > 

Wa*|s3;  =  p(ut ), 

EU(a*\S4)  =  -  P(uT)  +  1.0  -  Pm 

1 . 0  -  P  (T)  asn^°°  ^ 

Figure  6  shows  the  average  expected  utility,  for  ^^=7.,  for  selected  points 
on  the  prior  probability  surface.  The  lowest  El)  in  the  figure  is  .5.  This 
lowest  value  only  occurs  for  one  point  (.25.,  .  50,  .2* As  we  go  out  from 
this  point,  EU  gets  larger.  Compare  these  results  with  Figure  2,  i.e., 
the  case  in  which  no  test  is  given.  There  the  worst  possible  EU  is  1/3. 
This  occurs  only  for  the  point  (1/3,  1/3,  1/3).  Wien  no  test  is  given, 
there  are  only  three  points,  the  three  comers,  for  which  EU(a *)  ~  1.0 . 
These  are  the  three  possible  cases  for  which  two  of  the  states  have  a 
probability  of  0  while  the  remaining  state  has  a  probability  of  1.0.  But 
notice  that  when  a  one-item  IM  test  is  given,  that  besides  these  three 
points,  there  is  a  whole  line,  the  P(T)  axis,  which  gives  an  average 
expected  utility  of  1.0.  These  are  the  cases  for  which  P(uT)=0>  i.e.,  the 
student  is  either  trained  or  mistrained.  The  reason  the  average  expected 
utility  is  2.0  for  these  cases  is  that  the  data  discriminates  perfectly 
when  only  T  and  mT  are  possible.  In  other  words,  a  correct  answer  implies 
that  the  student  is  trained,  while  an  incorrect  answer  implies  that  he  is 
mistrained.  Consult  Table  4  to  confirm  this  point. 
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P(uT) 


Figure  6. 


The  average  expected  utility  for  selected  points  on  the  probability 
combination  surface  for  n=l. 


If  we  were  to  draw  a  figure  analogous  to  Figure  6  for  each  value  of 


ns  we  would  see  that  the  minimum  average  expected  utility  for  a  given  n 
would  be  at  the  point  where  the  two  dividing  lines  cross.  The  coordinates 
of  this  point  are 


P(T)  = 


2n+2 


,  P(uT)  = 


P,n+2 


and  the  average  expected  utility  for  this  point  is 

2n 

Table  7  shews  the  value  of  the  minimum  average  expected  utility  for 
selected  n9s •  Note  that  as  n  approaches  infinity,  the  minimum  average 
expected  utility  approaches  1.0 .  Also  note  that  even  for  eight  questions, 
the  minimum  value  is  getting  very  large. 


TABLE  7 


The  Minimum  Average  Expected  Utility  for  the  IM  Test  for  Selected  Values  of  n. 


Minimum  Average 

n 

Expected  Utility  =  °n 

2n+  2 

0 

. 333. . . 

1 

.500 

2 

. 666 . . • 

3 

.800 

4 

.  888. . . 

5 

.941 

6 

-.970 

7 

-.985 

8 

-.992 

• 

S 

• 

• 

oo 

1.000 
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5  •  The  Lf  fectiveness  of  the  Two  Methods  , 


Before  we  discuss  the  effectiveness  of  the  two  methods,  it  should 
be  reiterated  that  the  conclusions  which  we  draw  are  in  terms  of  the 
specific  response  model  we  have  assumed.  Some  or  all  of  our  conclusions 
may  be  valid  in  more  complex  situations,  but  that  is  a  matter  for  further 
inves  tigation. 

To  put  our  discussion  of  effectiveness  into  perspective,  we  will 
restate  the  main  assumptions  that  have  been  made.  It  is  assumed  in  the 
response  model  we  are  using  that  a  student  has  probabilities  for  the 
alternatives  of  a  question  and  that  the  values  of  these  probabilities 
depend  in  a  specific  way  on  the  state  of  his  training.  For  the  direct 
method  of  testing,  it  is  assumed  that  these  probabilities  can  be  inferred 
by  a  measurement  procedure.  For  the  indirect  method,  it  is  assumed 
that  the  student  uses  them  with  an  all-or-none  payoff  function  to  choose 
the  alternative  which  will  maximize  his  expected  utility. 

An  all-or-none  payoff  function  has  also  been  assumed  for  the 
instructor  who  is  making  the  classifications.  This  means  that  the 
expected  utility  of  the  instructor  can  also  be  interpreted  as  the 
expected  proportion  of  correct  classifications , and,  of  course,  that 
maximization  of  expected  utility  can  be  interpreted  as  maximization 
of  expected  proportion  of  correct  classifications. 

The  question  of  the  effectiveness  of  the  two  methods  can  be  looked 
at  from  two  points  of  view.  One  is  from  the  point  of  view  of  the 
instructor  who  is,  at  a  given  moment,  classifying  a  student.  The  other 
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is  from  the  point  of  view  of  some  outside  agent  who  knows  the  true 
state  of  each  student  and  who  can  thus  evaluate  the  instructor’s 
performance  in  terms  of  the  actual  state  of  each  student’s  knowledge. 

For  both  points  of  view,  average  expected  utility  is  used  as  the  measure 
of  effectiveness. 

5 .  1  Effectiveness  from  the  Instructor’ s  Point  _of  View  * 

First,  we  will  look  at  effectiveness  from  the  point  of  view  of  the 
instructor.  Once  the  instructor  has  assigned  prior  probabilities*  to 
the  three  states  for  a  particular  student,  and  given  that  he  accepts  the 
response  model  and  the  payoff  structure  we  have  specified,  the  results  of 
our  derivations  furnish  him  with  an  expected  utility  for  each  act  given  each 
possible  data  result  as  well  as  average  expected  utilities  for  responding 
with  the  optimal  act  for  each  data  result.  Thus  he  can  evaluate  the 
effectiveness  of  each  method  in  terms  of  its  average  expected  utility, 

EU(a*). 

Having  related  the  results  of  our  derivations  with  effectiveness, 
from  the  instructor’s  point  of  view,  let  us  briefly  review  CT7T,  IM,  and 
DM  in  terms  of  577.  For  classification  without  testing,  CUT,  El)  ranges 
from  .33...  to  1.0.  For  IM,  the  ranp7e  of  El)  depends  on  n.  Table  7 

shows  the  lower  bound  of  the  range  for  various  values  of  n.  The  upper 
bound  for  IM  is  1.0  regardless  of  n.  For  DM,  FU  is  a  constant^  1.0. 

Thus,  DM  has  the  narrowest  range  of  El).  And  aside  from  the  cost  of  using 
the  methods,  DM  is  better  than  or  equal  to  either  CUT  or  IM  for  all 

* 

These  probabilities  are  prior  to  the  results  of  the  testing  but  may 
include  various  types  of  non-test  information  about  the  student. 
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conditions.  Of  course,  a  person  choosing  between  the  methods  woul d 


take  cost  into  account.  We  do  not  do  it  here  because  it  is  not  relevant 
to  our  arguments  but  our  results  are  in  a  form  which  wil]  enable  anyone 
who  is  interested  to  do  so. 

Classifying  without  testing  gives  an  EU  of  1.0  at  three  points;  the 
points  for  which  a  particular  state  is  given  a  probability  of  1.0.  IM 
gives  an  EU  of  1.0  at  these  points  as  well  as  at  all  of  the  points  where 
P(uT)=0.  And,  of  course,  DM  gives  an  EU  of  1.0  at  all  points.  Note  that 
the  points  for  which  CWT  and  IM  give  1.0  all  require  some  form  of  certainty, 
either  that  a  particular  category  is  the  case  or  that  a  particular  category 
is  not  the  case. 

Influence  of.  ins  tructor f  s  pri  or  on  effectiveness  .  For  Cl77!,  the  closer 
a  prior  is  to  one  of  the  three  comers,  the  larger  EU  is  (See  Figure  2). 

For  IM,  the  closer  a  prior  is  toward  the  comer  for  which  P(uT)=l.  0  or 
toward  the  line  for  which  P(uT)=0>  the  larger  EU  is.  This  means  that  for 
CUT  and  IM,  the  instructor  may  be  able  to  use  background  information  on  a 
particular  student  in  conjunction  with  his  observations  on  that  student 
during  the  lesson  to  increase  his  EU  for  the.  student. 

By  observing  students  during  a  lesson  and  by  connecting  his  observations 
with  background  information  on  the  students,  the  instructor  may  get  some 
idea  of  v/hat  percentage  of  the  group  will  fall  in  each  category,  lie  could 
use  this  information  to  obtain  a  single  prior  which  would  be  used  for  each 
student.  His  effectiveness  in  classification  might  be  very  good  but.  there 
is  room  for  objection  to  his  use  of  a  single  prior  since  it  is,  in  effect, 
using  a  group  average  to  classify  individual  students.  He  could  remedy 
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this  situation  by  recalling  what  he  had  observed  about  particular 
students  and  attempting  to  assign  a  prior  to  each  student  reflecting 
his  feelings  about  the  state  of  knowledge  of  that  particular  student. 

And  further,  he  could  use  his  feelings  concerning  the  group  as  a  whole 
to  check  the  coherence  (de  Finetti,  1937)  of  his  priors  for  individuals. 

Thus  the  instructor  may  be  able  to  improve  the  effectiveness  of  CWT 
and  IM  by  obtaining  relevant  background  information  on  his  students  and 
by  observing  them  during  the  lesson.  Certainly,  this  is  an  improvement 
over  approaches  which  use  only  part  of  the  available  information  to 
classify  students  and  which  use  that  information  to  classify  a  student 
not  in  terms  of  his  absolute  performance  but  in  terms  of  the  performance 
of  some  group  of  which  he  is  a  member. 

Since  relevant  information  about  individual  students  is  essential 
to  CWT  and  IM,  but  not  to  DM,  it  is  easy  to  see  the  contribution  that 
DM  can  make  in  situations,  conforming  to  our  assumptions,  in  which  the 
person  making  the  classifications  may  not  be  on  hand  to  observe  the 
student,  e.g.,  self-instruction,  instruction  by  t-elevision;  or  in  which 
there  are  large  numbers  of  students  in  a  class  thereby  handicapping  the 
instructor  in  obtaining  information  about  individual  students.  But 
regardless  of  how  much  information  the  instructor  is  able  to  obtain 
about  his  students,  his  performance  with  CWT  and  IM  will  never  be  better  than 
with  DM,  for  the  situation  we  are  considering. 

Our  comments  on  the  instructor's  prior  point  up  the  fact  that  there 
is  information  other  than  answers  to  test  items  which  can  be  taken  into 
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account  in  classifying  students.  To  the  extent  that  this  infornation 
can  increase  the  instructor's  certainty  about  the  state  of  a  student, 

ChT  and  IM  increase  in  effectiveness,  hut  for  the  situation  we  have 
defined,  C'VT  and  IM  are  never  more  effective  than  DM.  Thus,  if  cost, 
in  conjunction  with  effectiveness,  justifies  the  use  of  DM,  we  can  skirt 
the  whole  issue  of  the  instructor's  probabilities,  since  the  information 
incorporated  in  them  is  superfluous.  * 

The  reader  should  be  clear  on  the  reason  that  IM  is  less  effective 
for  most,  conditions  than  DM.  The  reason  does  not  lie  in  the  area  of  the 
instructor's  subjective  probabilities.  The  derivation  of  both  IM  and 
DM  involved  the  instructor's  subjective  probabilities.  The  difference  is 
in  tne  conditional  probabilities  yielded  by  IM  as  opposed  to  those  yielded 
by  DM.  The  conditional  probabilities  of  DM  simply  supply  more  information 
than  those  of  IM.  Thus,  the  fact  that  IM  is  less  effective  than  DM 
cannot  be  taken  as  a  deprecation  of  subjective  probabilities.  And,  of 
course,  the  adoption  of  DM  would  not  eliminate  subjective  probabilities 
from  our  consideration  since  the  student's  subjective  probabilities  are 

basic  to  the  direct  method. 

As  n  gets  larger,  the  prior  probabilities  of  the  instructor  become 
less  important  in  the  case  of  most  priors  and  the  effectiveness  of  IM 
approaches  that  of  DM.  And,  as  we  have  seen,  the  approach  of  IM  to  DM  in 
terms  of.  performance  is  quite  rapid  so  that  n  does  not  have  to  be  very 
large  for  IM  to  approximate  DM  (See  Table  7).  This  brings  us  to  the 
question  of  independence  of  trials. 

*  It  should  be  noted  that  by  eliminating  the  need  for  the  instructor’s 
prior  probabilities  and  thus  allowing  a  larger  class  to  be  taught  with 
no  loss  in  effectiveness  this  economic  benefit  of  DM  should  certainly 
affect  the  slight  additional  cost  of  testing  with  DM. 


Independence  of  trials . 


We  have  assumed  for  IM  that  the  test  items 
for  a  £iven  concept  are  regarded  by  the  student  as  being  independent . 

This  assumption  would  seem  to  put  an  extreme  restriction  on  the  possible 
applications  of  our  results  for  IM.  It  is  difficult  to  picture  real-life 
situations  in  which  we  can  he  sure  that  the  answer  to  a  question  will  not 
affect  a  subsequent  answer  to  the  same  question  especially  if  it  is  asked 
again  immediately. 

Since,  in  our  model,  students  who  are  either  trained  or  mis  trained 
would  always  give  the  same  answer  to  repetitions  of  the  question  as  they 
gave  the  first  time  it  was  asked  while  students  who  are  untrained  would 
not,  it  is  the  untrained  student  for  whom  the  independenee  assumption 
is  critical.  Suppose,  for  example,  that  an  untrained  student  followed 
the  strategy,  which  eould  be  optimal  in  terms  of  his  formulation  of  the 
task,  of  giving  the  same  answer  to  a  particular  question  each  time  it  is 
repeated  that  he  gave  the  first  time  it  was  asked.  This  means  that  only 
the  first  trial  would  have  any  value  in  classifying  the  student  and  that 
the  results  for  the  n  corresponding  to  the  number  of  times  the  question 
was  asked  would  be  misleading.  Thus,  the  results  we  have  derived  for  IM, 
for  n>ljy  apply  only  if  the  trials  are  independent. 

Tills  means  that  IM  is  restricted  to  situations  for  which  the  trials 
are  independent  or  that  additional  assumptions  must  be  made  in  order  to 
handle  the  case  for  n>l .  But  DM  is  applicable  without  further  assumptions 
regardless  of  whether  the  trials  are  independent. 

Summary.  Now  let  us  summarize  our  conclusions  regarding  the  effective¬ 
ness  of  the  two  methods  from  the  standpoint  of  the  instructor.  We  will  do 
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so  in  terms  of  three  values  of  n:  n-0 j  n=lj  and  Remember  that 

we  are  not  taking  into  account  the  cost  of  using  the  methods. 

For  n-03  DM  is  at  least  as  good  as  classifying  without  testing.  The 
two  procedures  are  equivalent  only  when  the  instructor  is  certain  that  a 
student  is  in  a  particular  one  of  the  three  categories.  In  cases  where 
certainty  is  lacking  and  the  instructor  has  little  relevant  non-test 
information  on  the  student,  DM  does  much  better  than  Cl JT. 

For  n=lj  DM  is  at  least  as  good  as  IM  for  all  conditions.  IM  is 
equivalent  to  DM  only  when  there  is  certainty  that  the  student  is  untrained 
or  when  there  is  certainty  that  he  is  not  untrained.  And  here  again,  since 
the  prior  probabilities  are  important  for  IM,  DM  will  do  much  better  than 
IM  v/hen  certainty  is  lacking  and  the  instructor  lias  little  relevant  non-test 
information  on  the  student. 

As  n  gets  large,  the  role  of  the  prior  probabilities  lessens  and  the 
effectiveness  of  IM  increases.  As  IM  approaches  DM  in  effectiveness. 

But  if  more  than  one  question  is  used  for  IM,  the  trials  must  be  independent 
or  more  assumptions  must  be  made  in  order  for  the  results  of  IM  tc  be 
meaningful.  Of  course,  this  necessity  for  independence  does  not  apply  to 
DM,  since  only  one  question  is  necessary  in  order  to  give  perfect  class¬ 
ification.  Thus,  if  the  instructor  does  not  know  whether  the  independence 
assumption  applies  in  a  situation  and  he  does  not  have  enough  non-test 
information  to  tell  him  for  certain  which  state  a  given  student  is  in,  then 
DM  will  outperform  IM.  Me  should  also  note  that  for  IM  test  situations  in 
which  a  very  large  number  of  questions  are  asked,  the  cost  of  using  IM 
will  finally  come  into  play  even  if  it  is  negligible  for  small  n% s. 
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5 . 2  Effectiveness  from  an  Outside  Agent's  Point  of  View, 


We  have  discussed  the  effectiveness  of  the  two  test  methods  from 
the  point  of  view  of  the  instructor  who  is  classifying  students.  Now  we 
want  to  look  at  the  same  question  from  the  point  of  view  of  an  outside 
agent  who  knows  the  actual  state  of  each  student  at  the  time  the  student 
is  classified.  The  outside  agent  is  in  a  position  to  evaluate  an  instructor 
and  thus  to  evaluate  IK  given  the  instructor,  in  terms  of  information  in 
addition  to  that  which  the  instructor  has*  We  might  point  out  that  for 
our  purpose  it  does  not  matter  whether  there  is  an  agent  who  actually 
possesses  a  knowledge  of  the  category  of  each  student,  since  the  conclusions 
we  draw  will  be  the  same  whether  or  not  anyone  actually  has  this  knowledge. 

The  first  step  in  the  agent's  procedure  is  to  classify  students  who 
have  already  been  classified  by  the  instructor.  Whereas  the  instructor 
classified  on  the  basis  of  Ty  ufy  and  rrfPy  the  agent  classifies  on  the 
basis  of  the  particular  prior  distribution  the  instructor  used  for  a 
given  student.  We  will  represent  an  instructor’s  prior  distribution 
by  Py  where  P  is  the  vector  [P'(T)y  P(uT)y  P(mT)],  -As  we  have  seen,  Figure 
1  shows  all  of  the  possible  priors.  Once  the  agent  has  classified  students 
in  terms  of  Py  hi  can  find  the  relative  frequency  with  which  the  students, 
for  whom  a  particular  P  was  used,  actually  fell  in  Ty  uTy  and  rrfP,  We  will 
designate  this  relative  frequency  distribution  by  F %  where  F  is  the  vector 
[F(T)y  F(ul)y  F(rrtP))  and  where  F(T) ,  F(uT)  ,  F(mT)  designate  the  proportion 

* 

Note  that  the  agent  need  only  be  concerned  with  IM  not  DM  since  DM  is 
independent  of  the  instructor  and  guarantees  an  EU  of  1,0  for  all  conditions 
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of  students,  classified  by  the  instructor,  who  arc  actually  trained, 
untrained,  and  mis  trained,  respectively. 

Now  the  agent  is  in  a  position  to  ask  the  following  question:  "What 
would  the  instructor’s  average  expected  utility  be  if  the  students  for 
whom  he  uses  P  are  actually  distributed  according  to  F?"  We  will  designate 
this  average  expected  utility  as  EU(P\F)m  Now  let  us  see  how  we  can  obtain 
this  average  expected  utility. 

he  have  seen,  in  the  case  of  IM,  that  the  optimal  pattern  of  acts 
for  the  possible  data  results  depends  on  the  prior  distribution  used  by 
the  instructor.  (Sec  Table  6).  According  to  the  results  given  in  this 
table,  one  of  four  distinct  patterns  of  action  is  optimal  for  each 
possible  prior,  i.e.,  for  each  P,  Thus,  for  a  particular  r,  an  instructor 
can  use  the  results  of  Tabic  6  to  determine  the  pattern  of  acts,  given 
the  data,  which  will  maximize  his  average  expected  utility.  If  the 
instructor  gives  the  pattern  of  acts  associated  with  Pf  when  F  is  the 
relative  frequency  distribution  of  the  actual  states  of  the  students  for 
whom  P  is  used,  then  the  instructor  would  be  expected  to  obtain  EU(P\F) 
per  student  rather  than  EU(a*).  Thus,  at  any  point,  the  agent  has  an 
index,  EUF  -  EU(P\F)  9  of  the  instructor’s  performance,  so  far. 

It  may  be  helpful  at  this  point  to  distinguish  between  EUF  and  the 
actual  proportion  of  correct  classifications  that  the  instructor  has 
made  for  a  given  P  at  the  time  the  agent  is  evaluating  his  performance. 

The  actual  proportion  of  correct  classifications  depends,  at  any  point, 
on  the  distribution  of  data  results  generated  by  the  students  who  are 
actually  untrained.  rc  have  seen  that  in  the  long  run  this  distribution 
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v/i  11  be  a  function  of  a  binomial  distribution.  Tbit  the  data  results 
will  not,  in  general,  be  generated  systematically.  In  other  words,  they 
will  not  have  the  form  of  our  theoretical  distribution  at  every  point. 

For  example,  there  may  be  a  run  of  C-r  f  s  so  that  at  a  given  point  many 
more  Cn' s  have  been  given  by  untrained  students  than  our  equations  would 
indicate.  Of  course,  in  the  long  run,  the  data  results  generated  by 
untrained  students  should  approach  the  values  given  by  our  equations. 

Thus  the  actual  number  of  correct  classifications  an  instructor  has 
made,  up  to  a  given  point,  may  not  reflect  how  well  he  is  using  the  informa¬ 
tion  available  to  him.  In  other  v/ords ,  chance  fluctuations  in  the  data 
results  may  make  it  appear  that  he  is  using  the  available  information 
better  or  worse  than  he  actually  is.  To  get  rid  of  this  effect,  we  use 
the  theoretical  values  of  P(d\S)  rather  than  the  actual  percentages  of 
d  given  S  when  computing  EUF .  Thus,  EUF  gives  the  amount  the  instructor 
would  have  made  per  student,  by  using  P  when  F  is  the  case,  if  the  data 
results  had  been  generated  according  to  our  equations  up  to  this  point. 

And  so,  arbitrary  fluctuations  of  the  data  results  do  not  affect  the  agent’s 
evaluation  of  an  instructor  at  a  given  point. 

Ue  have  said  that  each  prior  can  be  associated  with  one  of  four 
distinct  patterns  of  action.  Since  there  are  only  four  possible  patterns 
of  action,  the  agent  needs  only  four  graphs,  for  a  particular  value  of  n, 
in  order  to  be  able  to  obtain  EUF  for  any  ?  and  F.  This  is  because  the 
equation  for  EU(PeS^ | F)  is  identical  to  the  EU  equation  we  have  already 
derived  for  PzS .  when  F  is  substituted  for  P .  And  further,  the  equation 


for  FU  applies  over  the  v/hole  P  surface  for  any  PcS *. 

Is 


’he  four 


relevant  equations  are: 

pn-l 

EU(PzSAF)  =  - - — 

i  2n 

■  F(uT)  +  F(T)  t 

EU(PcSs  1 F,  =  i 

■  F(UT)  +  1.0  y 

2 

EU(PcS3\F )  =  F(uT) 

J 

EU(PzsAf)  — 

q  2n 

F(uT)  +  1.0  -  F(T) 

Using  these  four  equations,  we  can  construct  the  four  graphs  for 
any  n .  Figure  7  shows  the  graphs  for  n- h  Notice  thatp  de terrines 
which  of  the  four  graphs  is  relevant  for  a  particular  situation.  For 
example,  if  FeS ^  the  agent  would  refer  to  the  upoer  left  hand  graph. 
Once  the  graph  is  chosen,  the  relevant  point  on  the  graph  is  found  hy 
taking  the  point  corresponding  to  F.  Also  notice  that  the  range  of  EUF 
is  from  0  to  1.0  for  each  of  the  graphs.  Tn  other  words,  when  the 
instructor  gives  P  for  X  students  and  the  frequency  distribution  of 
the  actual  states  of  the  X  students  is  F,  the  average  expected  utility, 
EUF ,  which  he  could  have  been  expected  to  make  in  this  situation,  could 
be  anything  between  0  and  1.0  depending  on  P  and  F. 

To  clarify  the  agent’s  procedure  of  evaluation,  let  us  look  at  an 
example.  Table  3  shows  eight  classifications  by  an  instructor  and  the 
trial-by-trial  evaluation,  by  the  agent,  of  the  instructor  and  thus  of 
IM  given  the  instructor,  ^or  the  first  subject  P  =  (1 . 0 ,0 y0 ) .  (This 
prior  falls  on  the  border  line  between  two  sections.  Tt  will  be 
sufficient  for  the  comparisons  we  are  going  to  make  to  regard  it  as 
being  in  Sp.)  Thus  the  instructor  would  use  the  pattern  ajy  dpy  a $ 
for  the  data  results  Cn>  Crt  and  C ^  respectively.  The  average  expected 
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Figure  7.  The  outside  agent’s  graphs  for  a  one-item  CM  test.  The  dashed  lines 

show  the  average  expected  utility  of  using  a  given  that  the  actual 

states  of  the  students  for  whom  P  is  used  are  distributed  as  F  . 
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TABLE 


Example  of  trial-by-trial  evaluation  of  instructor  by  the  outside  agent 
where  a  trial  is  the  classification  of  one  student  by  the  instructor  in 
terms  of  a  one-item  (n=l)  CM  test. 


Trial 

Instructor1 1 

5  Prior 

Section 
in  which 
Prior 
located 
for  n=l 

Actual 

Category 

of 

Student 

Current  relative 
distribution  for 
prior 

frequency 

instructor’s 

EU 

EUF 

P(T) 

P(  uT) 

P(mT) 

F(T) 

F(uT) 

F(rrfT) 

1 . 

1.0 

.0 

.0 

S2 

T 

1.00 

.00 

.00 

1.00 

1.00 

2. 

.9 

.  1 

.0 

S1 

T 

1.00 

.00 

.00 

.95 

1.00 

3. 

1.0 

.0 

.0 

S2 

uT 

.50 

.50 

.00 

1.00 

.50 

4. 

.  3 

.  4 

.  3 

S2 

uf 

.00 

1.00 

.00 

.60 

1.00 

5. 

.9 

.1 

.0 

S1 

uT 

.50 

.50 

.00 

.95 

.  75 

6. 

1.0 

.0 

.0 

S2 

T 

~ .  60 

-.33 

.00 

1.00 

.66 

7. 

.  8 

.  1 

.  1 

S2 

T 

1.00 

.00 

.00 

.90 

1.00 

8. 

1.0 

.0 

.0 

S2 

mT 

.50 

.25 

.25 

1.00 

.  75 

1*2 


utility ,#£/,  for  the  instructor  is  1.0.  This  is  given  in  the  next  to 
last  column.  The  actual  state  of  the  student  is  T.  Thus,  the  current 
relative  frequency  distribution  for  P  =  (1.0S0S0)  is  F  =  (1.030s0) . 

And  the  average  expected  utility,  EJJF ,  of  using  the  pattern  given  by  P 
when  F  is  the  case  is  1.0. 

For  the  second  student,  the  instructor  uses  a  different  prior.  Note 
that  EU  is  less  than  EUT  here.  In  other  words,  if  all  of  the  students 
for  whom  the  instructor  used  this  prior  were  trained,  the  instructor 
would  classify  them  all  correctly  in  the  long  run  in  spite  of  the  fact 
that  his  EU  is  merely  .95.  This  is  because  if  all  of  the  students  were 
trained,  only  the  data  result  Cn  would  be  generated  and,  with  this  prior, 
the  optimal  strategy  is  to  call  the  student  trained  when  Cn  is  observed. 

On  trial  three,  the  instructor  uses  P  =  (1.09090)  again.  But  this 

time  the  student  is  actually  untrained.  For  n=l  an  untrained  student 

can  be  either  Cn  or  CQ.  This  means  that  there  is  a  possibility  of 

conflict  between  the  instructor’s  prior  and  the  data  result,  since  the 

instructor  has  expressed  certainty  that  the  student  is  trained.  If  Cn 

is  obtained  from  the  untrained  student,  then  the  instructor  will  not  be 

aware  of  the  conflict.  His  EU  will  be  1.0.  But  since  this  student  is 

untrained,  the  instructor  will  be  unable  to  correctly  classify  all  students 

for  this  prior.  If  CQ  is  obtained  the  instructor  will  either  have  to 

re-evaluate  his  prior  or  ignore  the  data.  If  the  instructor  in  Table  8 

obtained  a  C ^  for  trial  3,  or  if  he  obtained  a  C and  ignored  it,  his 
* 

would  be  .5. 

*  Our  comments  on  the  first  three  trials  can  be  used  as  an  aid  in  examining 
the  remaining  trials  in  the  table. 


The  instructor  has 


Table  9  shows  the  summary  treasures  for  Table  3. 
used  P  =  (l.OjOjO)  four  tines  and  the  actual  states  of  the  students  have 

TABLE  9 

Summary  neasures  of  Table  3  shewing  instructor1  s  nerfomance 
to  date  fron  viewpoint  of  outside  agent. 


Instructor*  s 

:  Prior 

Section 

Current  relative  frequency 
distribution  for 
instructor1 s  prior 

7.U 

EUF 

P(T) 

P(uT) 

P(mT) 

F(T) 

F(uT) 

F(mT) 

1.0 

.0 

.0 

Ss 

.50 

.  05 

.  0,5 

1.00 

.  75 

.  1 

.0 

S1 

.50 

.50 

.00 

.05 

.  75 

.  A 

.  5 

$2 

.00 

1.00 

.00 

.00 

1.00 

.  8 

.  1 

.  1 

S1 

1.00 

.00 

.on 

.no 

1 .  on 

been  distributed  as  F  =  (.50,  .25,  ,  25J.  Thus,  as  far  as  the  instructor 
is  concerned,  his  average  h’U  for  the  four  trials  is  1.0.  But  from  die 
standpoint  of  the  agent,  it  is  .  75.  This  points  up  a  difference  between 
PM  and  CM.  If  the  instructor  had  used  DTI,  he  would  have  been  guaranteed 
the  correct  classification  of  all  four  students.  Put  with  IM,  he  is  mot 
guaranteed  the  correct  classification  of  each  student,  even  though 
P  =  (1.  3, P, 0)  and  KU  =  1.0. 

Thus,  we  see  that  IM  involves  more  uncertainty  than  DM.  And  the 
additional  uncertainty  in  IM  comes  from  the  fact  that  IM  i;  dependent 
on  the  prior  probabilities  of  the  instructor  v;h  ere  as  DM  is  not.  If  it 
were  1  nown  for  certain  that  an  instructor's  P  was  equivalent  to  Pfj  then 


hh 


then  there  would  be  no  rcre  uncertainty  concerning  IU,  for  that  instructor, 

than  there  is  for  DM.  And,  EU  could  he  interpreted  as  both  the  instructor’s 

average  expected  utility  for  that  trial  and  the  earnings  per  trial  or  the 

proportion  of  correct  classifications  per  trial  which  could  he  expected  in 

the  long  run.  In  other  words,  EU  and  EUF  would  be  equivalent.  Under  these 

circumstances,  we  could  say  that  there  are  certain  conditions  for  which  Iv 

and  DM  are  equivalent,  namely,  the  conditions  for  which  the  instructor  is 

certain  that  the  student  is  untrained  or  certain  that  he  is  not  untrained. 

And,  of  course,  CUT  would  be  equivalent  to  DM  for  the  cases  in  which  the 

instructor  gives  a  prior  probability  of  1.0  to  a  particular  category.  Of 

course,  these  equivalences  are  from  the  agent’s  point  of  view. 

But  if  P  and  F  are  not  equivalent  for  an  instructor,  then  EU  and 

EUF  will  not,  in  general,  be  equivalent.  Thus,  we  cannot,  without  making 

further  assumptions,  say  what  level  of  effectiveness  we  can  expect  from  IM 

for  a  given  instructor.  But  we  do  know  that  it  can  be  no  greater  than  1.0 

* 

regardless  of  the  relation  of  P  to  F. 

5 . 3  Summary 

It  seems  clear,  after  having  compared  the  effectiveness  of  the  two 
methods  both  from  the  instructor's  point  of  view  and  from  an  outside  agent’s 
point  of  view,  that  the  direct  method  is  more  effective  than  the  indirect 
method  for  all  conditions,  aside  from  the  question  of  the  cost  of  use. 

From  the  instructor’s  point  of  view,  there  are  conditions  for  which  the 
two  methods  give  equivalent  results.  But  from  the  agent's  point  of  view 

* 

The  questions  raised  in  this  section  concerning  the  subjective  prob abilities 
of  the  instructor  are  analogous  to  questions  which  will  become  relevant  in 
terms  of  the  student’s  subjective  probabilities  when  we  begin  to  look  at 
situations  in  which  the  student’s  subjective  probabilities  can  be  values 

anywhere  in  the  interval  [0>1.0]. 


we  see  that  there  is  uncertainty  involved  in  I.M  which  is  not  involved 
in  1)M,  viz.,  we  are  never  certain  that  the  instructor's  P  and  F  are 
related  in  such  a  way  that  EU  and  EUF  are  both  1,0, 

he  nave  seen  also  that  the  effectiveness  of  IT'  can  be  inn  roved  up 
to  a  limit  of  EU=1  if  the  instructor  has  relevant  non-test  information 
on  the  student  and/or  if  a  question  is  repeated.  But  regardless  of  the 
amount  of  additional,  information,  the  effectiveness  of  IM  car  never  be 
greater  than  that  of  DM,  since  EU  for  DM  is  1.0  for  all  conditions.  Ue 
also  noted  that  repeated  questions  are  valid  for  PI,  in  our  situation, 
only  if  the  questions  are  treated  by  the  student  as  being  independent. 
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This  report  compares  two  types  of  classroom  testing  in  terms  of  efficacy  in  guiding 
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of  choices.  The  other  type  is  the  direct  method  based  on  admissible  probability  measurement. 
The  general  finding  is  that  the  direct  methods  always  perform  as  well  as  and  in  most  cases 
better  than  the  indirect  methods.  This  deficiency  in  the  indirect  method  can  be  alleviated  in 
theory  by  introducing  redundancy  into  the  test  and  asking  the  same  question  over  and  over 
again.  The  performance  of  indirect  methods  depends  in  a  very  critical  manner  upon  the 
information  available  to  the  instructor  from  other  sources  about  the  current  state  of  knowledge 
of  each  student.  The  performance  of  the  direct  methods  is  unaffected  by  this.  The  gain  in 
effectiveness  achieved  by  using  direct  methods  must  be  balanced  off  against  the  cost  of  using 
these  new  methods.  A  direct  method  may  require  more  student  time  per  item  than  does  an 
indirect  method.  This,  however,  may  be  more  than  compensated  for  by  the  requirement  for 
redundancy  when  using  the  indirect  method.  In  addition,  since  a  direct  method  does  not 
require  additional  information  from  the  instructor  as  to  the  current  state  of  knowledge  of  each 
student,  the  possibility  exists  that  much  larger  classes  may  be  taught  with  no  loss  in 
effectiveness  thus  implying  even  further  economic  benefits  from  the  use  of  direct  methods  to 
guide  classroom  instruction. 
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